IEEE Access — Latest Matching Preprints

1

Experiment-free learning of exoskeleton assistance remains an unsolved problem

Collins, S. H.; De Groote, F.; Gregg, R. D.; Huang, H.; Lenzi, T.; Sartori, M.; Sawicki, G. S.; Si, J.; Slade, P.; Young, A. J.

2026-04-06 physiology 10.64898/2026.04.01.715109 medRxiv

Top 0.1%

7.3%

Show abstract

In "Experiment-free exoskeleton assistance via learning in simulation", Luo et al. [1] present an ambitious framework for developing exoskeleton controllers through reinforcement learning exclusively in computer simulation. The authors report that a control policy trained on a small dataset from one subject was directly transferred to physical hardware, reducing human metabolic cost during walking, running, and stair climbing by more than any prior device. If confirmed, this would represent a major breakthrough for the field of wearable robotics and their clinical applications. However, a close examination of the published materials casts doubt on these claims. The reported experimental results violate physiological limits on the relationship between mechanical power and muscle energy use during gait2,3,4. The algorithmic claims are surprising and cannot be verified; in contrast with established replicability standards in machine learning5,6, executable code has not been made available. We conclude that the goals of this study have not yet been verifiably achieved and make recommendations for avoiding publication errors of this type in the future.

2

Revisiting Reconstruction Likelihood: Variational Autoencoders for Biological and Biomedical Data Clustering

Korenic, A.; Özkaya, U.; Capar, A.

2026-04-12 bioinformatics 10.64898/2026.04.09.717460 medRxiv

Top 0.1%

4.9%

Show abstract

Background and ObjectiveVariational Autoencoders (VAEs) offer a powerful framework for unsupervised anomaly detection and data clustering, often surpassing traditional methods. A core strength of VAEs lies in their ability to model data distributions probabilistically, enabling robust identification of anomalies and clusters through reconstruction likelihood -- a stochastic metric providing a principled alternative to deterministic error scores. MethodsWe investigated how different VAE architectures, combining reconstruction likelihood with a learnable or data-driven prior, performed in a clustering task on a toy dataset such as MNIST. Results were verified using dimensionality reduction techniques like t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), alongside clustering algorithms such as k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). ResultsThe VAEs encoder inherently maps data points into a latent space exhibiting discernible cluster structure, as evidenced by alignment with ground truth labels. While dimensionality reduction techniques (both t-SNE and UMAP) facilitated the application of clustering algorithms (k-means and HDBSCAN), these methods were primarily used to visualize and interpret the latent space organization. ConclusionsThis study demonstrates that VAEs effectively cluster data by implicitly encoding assignments in their latent representations. Determining cluster membership from encoder output, combined with reconstruction likelihood using semantic features, offers a principled approach for identifying typical samples and anomalies. Future research should focus on leveraging this inherent clustering capability of VAEs to enhance interpretability and facilitate clinical application.

3

sEEGnal: an automated EEG preprocessing pipeline evaluated against expert-driven preprocessing

Ramirez-Torano, F.; Hatlestad-Hall, C.; Drews, A.; Renvall, H.; Rossini, P. M.; Marra, C.; Haraldsen, I. H.; Maestu, F.; Bruna, R.

2026-04-20 neurology 10.64898/2026.04.16.26351021 medRxiv

Top 0.2%

3.2%

Show abstract

Electroencephalography (EEG) preprocessing is a critical yet time-consuming step that often relies on expert-driven, semi-automatic pipelines, limiting scalability and reproducibility across large datasets. In this work, we present sEEGnal, a fully automated and modular pipeline for EEG preprocessing designed to produce outputs comparable to expert-driven analyses while ensuring consistency and computational efficiency. The pipeline integrates three main modules: data standardization following the EEG extension of the Brain Imaging Data Structure (BIDS), bad channel detection, and artifact identification, combining physiologically grounded criteria with independent component analysis and ICLabel-based classification. Performance was evaluated against manual preprocessing performed by EEG experts at two complementary levels: preprocessing metadata (bad channels, artifact duration, and rejected components) and EEG-derived measures. In addition, test-retest analyses were conducted to assess the stability of the pipeline across repeated recordings. Results show that sEEGnal achieves performance comparable to expert-driven preprocessing while preserving key neurophysiological features. Furthermore, the pipeline demonstrates reduced variability and increased consistency compared to human experts. These findings support sEEGnal as a robust and scalable solution for automated EEG preprocessing in both research and large-scale applications. HighlightsFully automated and modular EEG preprocessing pipeline. Benchmarked against expert-driven preprocessing. Comparable performance in metadata and EEG-derived measures. Demonstrates stable performance in test-retest recordings. BIDS-based framework for reproducible EEG data handling.

4

Ventilator triggering control with an LSTM-Based Model

Liu, J.; Fan, J.; Deng, Z.; Tang, X.; Zhang, H.; Sharma, A.; Li, Q.; Liang, C.; Wang, A. Y.; Liu, L.; Luo, K.; Liu, H.; Qiu, H.

2026-04-11 respiratory medicine 10.64898/2026.04.10.26350573 medRxiv

Top 0.2%

3.1%

Show abstract

Background: Patient-ventilator synchrony, an essential prerequisite for non-invasive mechanical ventilation, requires an accurate matching of every phase of the respiration between patient and the ventilator. Methods: We developed a long short-term memory (LSTM)-based model that can predict the inspiratory and expiratory time of the patient. This model consisted of two hidden layers, each with eight LSTM units, and was trained using a dataset of approximately 27000 of 500-ms-long flow signals that captured both inspiratory and expiratory events. Results: The LSTM model achieved 97% accuracy and F1 score in the test data, and the average trigger error was less than 2.20%. In the first trial, 10 volunteers were enrolled. In "Compliance" mode, 78.6% of the triggering by the LSTM model was compatible with neuronal respiration, which was higher than Auto-Trak model (74.2%). Auto-Trak model performed marginally better in the modes of pressure support = 5 and 10 cmH2O. Considering the success in the first clinical trial, we further tested the models by including five patients with acute respiratory distress syndrome (ARDS). The LSTM model exhibited 60.6% of the triggering in the 33%-box, which is better than 49.0% of Auto-Trak model. And the PVI index of the LSTM model was significantly less than Auto-Trak model (36.5% vs 52.9%). Conclusions: Overall, the LSTM model performed comparable to, or even better than, Auto-Trak model in both latency and PVI index. While other mathematical models have been developed, our model was effectively embedded in the chip to control the triggering of ventilator. Trial registration: Approval Number: 2023ZDSYLL348-P01; Approval Date: 28/09/2023. Clinical Trial Registration Number: ChiCTR2500097446; Registration Date: 19/02/2025.

5

Vision Language Model for Coronary Angiogram Analysis and Report Generation: Development and Evaluation Study

Jiang, Q.; Ke, Y.; Sinisterra, L. G.; Elangovan, K.; Li, Z.; Yeo, K. K.; Jonathan, Y.; Ting, D. S. W.

2026-04-21 cardiovascular medicine 10.64898/2026.04.19.26351241 medRxiv

Top 0.2%

3.1%

Show abstract

Coronary artery disease is a leading cause of morbidity and mortality. Invasive coronary angiography is currently the gold standard in disease diagnosis. Several studies have attempted to use artificial intelligence (AI) to automate their interpretations with varying levels of success. However, most existing studies cannot generate detailed angiographic reports beyond simple classification or segmentation. This study aims to fine-tune and evaluate the performance of a Vision-Language Model (VLM) in coronary angiogram interpretation and report generation. Using twenty-thousand angiogram keyframes of 1987 patients collated across four unique datasets, we finetuned InternVL2-4B model with Low-Rank Adaptor weights that can perform stenosis detection, anatomy labelling, and report generation. The fine-tuned VLM achieved a precision of 0.56, recall of 0.64, and F1-score of 0.60 for stenosis detection. In anatomy segmentation, it attained a weighted precision of 0.50, recall of 0.43, and F1-score of 0.46, with higher scores in major vessel segments. Report generation integrating multiple angiographic projection views yielded an accuracy of 0.42, negative predictive value of 0.58 and specificity of 0.52. This study demonstrates the potential of using VLM to streamline angiogram interpretation to rapidly provide actionable information to guide management, support care in resource-limited settings, and audit the appropriateness of coronary interventions. AUTHOR SUMMARYCoronary artery disease has heavy disease burden worldwide and coronary angiogram is the gold standard imaging for its diagnosis. Interpreting these complex images and producing clinical reports require significant expertise and time. In this study, we fine-tuned and investigated an open-source VLM, InternVL2-4B, to interpret and report coronary angiogram images in key tasks including stenosis detection, anatomy identification, as well as full report generation. We also referenced the fine-tuned InternVL2-4B against state-of-the-art segmentation model, YOLOv8x, which was evaluated on the same test sets. We examined how machine learning metrics like the intersection over union score may not fully capture the clinical accuracy of model predictions and discussed the limitations of relying solely on these metrics for evaluating clinical AI systems. Although the model has not yet achieved expert-level interpretation, our results demonstrate the potential and feasibility of automating the reporting of coronary angiograms. Such systems could potentially assist cardiologists by improving reporting efficiency, highlightning lesions that may require review, and enabling automated calculations of clinical scores such as the SYNTAX score.

6

Analysis and Mitigation of Equipment-induced Shortcuts in AI Models for Laparoscopic Cholecystectomy

Protserov, S.; Repalo, A.; Mashouri, P.; Hunter, J.; Masino, C.; Madani, A.; Brudno, M.

2026-04-24 surgery 10.64898/2026.04.22.26351545 medRxiv

Top 0.2%

2.9%

Show abstract

Machine learning models have seen a lot of success in medical image segmentation domain. However, one of the challenges that they face are confounders or shortcuts: spurious correlations or biases in the training data that affect the resulting models. One example of such confounders for surgical machine learning is the setup of surgical equipment, including tools and lighting. Using the task of identification of safe and dangerous zones of dissection in laparoscopic cholecystectomy images and videos as a use-case, we inspect two equipment-induced biases: the presence of surgical tools in the field of view and the position of lighting. We propose methods for evaluating the severity of these biases and augmentation-based methods for mitigating them. We show that our tool bias mitigations improve the models' consistency under tool movements by 9 percentage points in the most inconsistent cases, and by 4 percentage points on average. Our lighting bias mitigations help reduce fraction of true dangerous zone pixels that may be predicted as safe under light changes from 5% to 1.5%, without compromising segmentation quality.

7

Can predictive simulations provide insights for personalizing assistive wearable device design?

Mahmoudi, A.; Firouzi, V.; Rinderknecht, S.; Seyfarth, A.; Sharbafi, M. A.

2026-04-01 bioengineering 10.64898/2026.03.30.715312 medRxiv

Top 0.2%

2.7%

Show abstract

Optimizing assistive wearable devices is crucial for their efficacy and user adoption, yet state-of-the-art methods like Human-in-the-Loop Optimization (HILO) and biomechanical modeling face limitations. HILO is time-consuming and often restricted to optimizing control parameters, while inverse dynamics assumes invariant kinematics, which is unreliable for adaptive human-device interaction. Predictive simulation offers a powerful alternative, enabling computational exploration of design spaces. However, existing approaches often lack systematic optimization frameworks and rigorous validation against experimental data. To address this, we developed a Design Optimization Platform that integrates predictive simulations within a two-level optimization structure for personalizing assistive device design. This paper primarily validates the platforms predictive simulations against a publicly available dataset of the passive Biarticular Thigh Exosuit (BATEX), assessing its reliability. Our findings show that the model can sufficiently predict the kinematics and major muscle activations, except for the pelvis tilt and some biarticular muscles. The key finding is that successful identification of personalized optimal BATEX stiffness parameters needs acceptable prediction of metabolic cost trends, not their precise values. Our analysis further reveals that the models accuracy in predicting Vasti muscle activation in the baseline condition is a significant indicator of its success in predicting metabolic cost trends. This demonstrates that accurate prediction of performance trends is more important for effective simulation-based design optimization than perfect biomechanical accuracy, advancing targeted and efficient assistive device development.

8

Gradient-specified optimization based on muscle surface mesh and moment arm as an effect-oriented approach of automated musculotendon path modeling

Chen, Z.; Hu, T.; Haddadin, S.; Franklin, D.

2026-04-19 bioengineering 10.64898/2026.04.15.718668 medRxiv

Top 0.2%

2.5%

Show abstract

There is more to musculotendon path modeling than aligning a cable to reflect the geometric features of a muscle-tendon unit. From the perspective of simulation accuracy, the key is to replicate the length- and moment arm-joint angle relations of the target muscle. In this study, we propose an effect-oriented approach of automated path modeling, via the hybrid calibration based on muscle surface mesh and moment arm. The task is formulated as an optimization problem with a threefold objective for the path to: 1) pass through multiple ellipses representing muscle cross-sections, 2) yield moment arms that match experimental measurements, and 3) yield moment arms with the designated signs. The performance of our optimization framework is demonstrated with the musculoskeletal surface mesh from the Visible Human Male and moment arm datasets from literature--producing 42 paths that are anatomically realistic and biomechanically accurate in 20.1 min. Our optimization framework is gradient-specified, which is faster and more accurate than using the default numerical gradient, making it applicable for large-scale subject-specific uses.

9

QRS Detection by Combinatorial Optimization With MLP Assisted Peak Scoring

Hopenfeld, B.

2026-04-22 bioengineering 10.64898/2026.04.19.719501 medRxiv

Top 0.3%

1.9%

Show abstract

A multiple channel QRS detector is described. The detector partitions raw signal segments into peak domains, extracts parameters associated with the peak domains, and scores peaks based on these parameters. A multi-layer perceptron (MLP) with 11 inputs generates provisional peak scores, which are refined through application of rules involving 20-30 parameters. An optimal sequence of supra threshold peaks is determined. Separately, combinatorial optimization determines an optimal structured heart rhythm sequence. Adjudication between the general supra threshold sequence and the structured sequence depends on noise level, peak quality, and rhythm structure quality. For multiple channel fusion, peak scores are determined as a noise weighted function of channel peak scores. The MLP was trained on approximately 70% of channel 1 of the MIT-BIH Arrhythmia Database. The supplementary rules were heuristically chosen over all channel 1 records. Sensitivity (SE) and positive predictive value (PPV) of the detector applied to channel 2 were a function of the noise threshold used to discard segments. At a noise level that would exclude 2.2% of channel 1 data, the SE and PPV were 99.67% and 99.75% respectively. Importantly, even in high noise, the detector was able to track large scale features of heart rhythm. Fused channel 1 and channel 2 SE and PPV were 99.96% and 99.98% respectively. The present algorithm points the way toward maximal extraction of heart rhythm information from noisy signals, and the potential to reduce false alarms generated by automated rhythm analysis software.

10

Hierarchical Barycentric Multimodal Representation Learning for Medical Image Analysis

Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.

2026-04-06 neurology 10.64898/2026.04.05.26350202 medRxiv

Top 0.3%

1.9%

Show abstract

Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.

11

REDDI: A Riemannian Ensemble Learning Framework for Interpretable Differential Diagnosis of Neurodegenerative Diseases

Roca, M.; Messuti, G.; Klepachevskyi, D.; Angiolelli, M.; Bonavita, S.; Trojsi, F.; Demuru, M.; Troisi Lopez, E.; Chevallier, S.; Yger, F.; Saudargiene, A.; Sorrentino, P.; Corsi, M.-C.

2026-04-12 neurology 10.64898/2026.04.10.26350617 medRxiv

Top 0.3%

1.8%

Show abstract

Neurodegenerative diseases such as Mild Cognitive Impairment (MCI), Multiple Sclerosis (MS), Parkinson s Disease (PD), and Amyotrophic Lateral Sclerosis (ALS) are becoming more prevalent. Each of these diseases, despite its specific pathophysiological mechanisms, leads to widespread reorganization of brain activity. However, the corresponding neurophysiological signatures of these changes have been elusive. As a consequence, to date, it is not possible to effectively distinguish these diseases from neurophysiological data alone. This work uses Magnetoencephalography (MEG) resting-state data, combined with interpretable machine learning techniques, to support differential diagnosis. We expand on previous work and design a Riemannian geometry-based classification pipeline. The pipeline is fed with typical connectivity metrics, such as covariance or correlation matrices. To maintain interpretability while reducing feature dimensionality, we introduce a classifier-independent feature selection procedure that uses effect sizes derived from the Kruskal-Wallis test. The ensemble classification pipeline, called REDDI, achieved a mean balanced accuracy of 0.81 (+/-0.04) across five folds, representing a 13% improvement over the state-of-the-art, while remaining clinically transparent. As such, our approach achieves reliable, interpretable, data-driven, operator-independent decision-support tools in Neurology.

12

Multimodal Integration of Ambulatory ECG and Clinical Features for Sudden Cardiac Death and Pump Failure Death Prediction

Swee, S.; Adam, I.; Zheng, E. Y.; Ji, E.; Wang, D.; Speier, W.; Hsu, J.; Chang, K.-W.; Shivkumar, K.; Ping, P.

2026-04-22 cardiovascular medicine 10.64898/2026.04.21.26351421 medRxiv

Top 0.4%

1.7%

Show abstract

Ambulatory electrocardiograms (ECG) provides continuous monitoring of the hearts electrical activity. However, many existing machine learning and artificial intelligence models for analyzing ambulatory ECG traces are often unimodal and do not incorporate patient clinical context. In this study, we propose a multimodal framework integrating ambulatory ECG-derived representations with clinical text embeddings to predict two cardiac outcomes: sudden cardiac death and pump failure death. Ambulatory ECG traces are preprocessed, segmented, and encoded via a multiple instance learning and temporal convolutional neural network framework. In parallel, patient clinical features are parsed into structured prompts, which are passed through a large language model to generate clinical reasoning; this reasoning passes through a biomedical language encoder to generate a text embedding. With the ECG and text embeddings, we systematically evaluate multiple fusion strategies, including concatenation- and gating-based approaches, to integrate these two data modalities. Our results demonstrate that multimodal models consistently outperform unimodal baselines, with adaptive fusion mechanisms providing the greatest improvements in predictive performance. Decision curve analysis highlights the potential clinical utility of the proposed framework for risk stratification. Finally, we visualize model attention across modalities, including ECG attention patterns, segment-level saliency, heart rate variability features, and clinical reasoning, to contextualize patient-specific predictions.

13

Wearable Dual-Modality Plethysmography for Arterial Modulation and Blood Pressure Dip

Jung, S.; Thomson, S.

2026-04-21 physiology 10.64898/2026.04.17.719282 medRxiv

Top 0.5%

1.4%

Show abstract

Continuous, non-invasive cardiovascular monitoring is limited by the superficial sensing depth of Photoplethysmography (PPG), which is susceptible to peripheral artifacts. This study evaluates a wearable dual-modality prototype integrating dryelectrode Impedance Plethysmography (IPG) and PPG within a smartwatch form factor. Results from a pilot study (N=2) demonstrate that IPG signals exhibit a temporal lead over PPG across ventral and dorsal sites, supporting its greater penetration depth. During brachial artery modulation, IPG showed superior sensitivity to arterial recovery on the ventral forearm. Furthermore, 60-minute napping sessions revealed that while PPG remained morphologically stable, IPG signals underwent significant evolution, capturing distinct pulsewave archetypes. These findings suggest that wearable IPG provides a high-fidelity window into deep systemic hemodynamics typically reserved for clinical instrumentation.

14

A Deployable Explainable Deep Learning System for Tuberculosis Detection from Chest X-Rays in Resource-Constrained High-Burden Settings

Agumba, J.; Erick, S.; Pembere, A.; Nyongesa, J.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349662 medRxiv

Top 0.5%

1.2%

Show abstract

Abstract Objectives: To develop and evaluate a deployable deep learning system with Gradient-weighted Class Activation Mapping (Grad-CAM) for tuberculosis screening from chest radiographs and to assess its classification performance and explainability across desktop and mobile deployment platforms. Materials and methods: This study used publicly available chest X-ray datasets containing Normal and Tuberculosis images. A DenseNet121-based transfer learning model was trained using stratified training, validation, and test splits with data augmentation and class weighting. Model performance was evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Grad-CAM was used to visualize regions influencing model predictions. The trained model was converted to TensorFlow Lite and deployed in both a Windows desktop application and a Flutter-based mobile application for offline inference and visualization. Results: The model demonstrated strong classification performance on the independent test dataset, with high accuracy and AUC values indicating effective discrimination between Normal and Tuberculosis cases. Grad-CAM visualizations showed that the model focused primarily on anatomically relevant lung regions, particularly the upper and mid-lung fields in Tuberculosis cases. Deployment testing confirmed consistent prediction outputs and Grad-CAM visualizations across both Windows and mobile platforms. Conclusion: The proposed deployable deep learning system with Grad-CAM provides accurate and interpretable tuberculosis screening from chest radiographs and demonstrates feasibility for offline mobile and desktop deployment. This approach has potential as an artificial intelligence-assisted screening and decision support tool in radiology, particularly in resource-limited and remote healthcare settings.

15

Test-retest reliability of resting-state fMRI functional connectivity: impact of scan length and number of participants

Vale, B.; Correia, M. M.; Figueiredo, P.

2026-04-02 bioengineering 10.64898/2026.03.31.715533 medRxiv

Top 0.7%

1.0%

Show abstract

Resting-state functional MRI has been widely used to study brain connectivity, yet the test-retest reliability of commonly used metrics remains a concern. To improve reliability, extended scan lengths and larger subject cohorts are often recommended. However, these solutions can be impractical and pose challenges, particularly in studies of clinical populations. Here, we systematically assess the reliability of two main types of functional connectivity measures: node-based connectome metrics (edge-level intraclass correlation coefficient [ICC], connectome-level ICC, functional connectivity fingerprinting, and discriminability); and voxel-based resting-state networks (RSNs) (spatial similarity of independent component analysis [ICA]-derived RSN maps quantified using the Dice coefficient). Using data from the Human Connectome Project, we evaluated the effects of scan length (3.6, 7.2, 10.8, and 14.4 minutes) and number of participants (n = 10, 20, 50, and 100), on both within-session and between-session reliability. We found that multivariate connectome metrics demonstrated greater reliability than edge-level measures, and that scan length had stronger influence on test-retest reliability than the number of participants. For connectome metrics, 14 minutes of scanning and a cohort of approximately 20 participants were sufficient to achieve reliable estimates. In contrast, RSN measures benefited from larger cohort sizes. Our findings provide practical guidelines for designing resting-state fMRI studies in terms of scan length and number of participants, balancing reliability and feasibility. Ultimately, protocol choices should be guided by the specific study objectives and the functional connectivity metric of interest.

16

Analysis of biological networks using Krylov subspace trajectories

Frost, H. R.

2026-03-31 bioinformatics 10.64898/2026.03.29.715092 medRxiv

Top 0.7%

0.9%

Show abstract

We describe an approach for analyzing biological networks using rows of the Krylov subspace of the adjacency matrix. Specifically, we explore the scenario where the Krylov subspace matrix is computed via power iteration using a non-random and potentially non-uniform initial vector that captures a specific biological state or perturbation. In this case, the rows the Krylov subspace matrix (i.e., Krylov trajectories) carry important functional information about the network nodes in the biological context represented by the initial vector. We demonstrate the utility of this approach for community detection and perturbation analysis using the C. Elegans neural network.

17

Pneumonia Detection in Paediatric Chest X-Rays using Ensembled Large Language Models

Tan, J.; Tang, P. H.

2026-04-12 radiology and imaging 10.64898/2026.04.10.26347909 medRxiv

Top 0.7%

0.9%

Show abstract

Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.

18

How can AI be compatible with evidence-based medicine?: with an example of analysis of lung cancer recurrence

Usuzaki, T.; Matsunbo, E.; Inamori, R.

2026-04-25 radiology and imaging 10.64898/2026.04.17.26351114 medRxiv

Top 0.7%

0.9%

Show abstract

Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.

19

A Grid-Search Framework for Dataset-Specific Calibration of Actigraphy Sleep Detection Algorithms

Rahjouei, A.

2026-04-09 bioinformatics 10.64898/2026.04.07.706161 medRxiv

Top 0.7%

0.9%

Show abstract

Actigraphy is widely used for long-term sleep monitoring, but established sleep-wake scoring algorithms often require parameter tuning, which is commonly performed manually and can reduce reproducibility. In this study, a grid-search-based calibration framework is presented for established actigraphy algorithms and evaluate whether it can serve as a practical alternative to manual tuning. The method was evaluated using two datasets: a multi-subject polysomnography-validated actigraphy dataset and a self-collected dual-device dataset. In the polysomnography-validated dataset, grid-search optimization produced performance patterns similar to manual parameter selection, while slightly improving detection of sleep onset and sleep offset and yielding modest gains in wake-sensitive metrics. In the dual-device dataset, consensus and majority voting were useful for reducing the influence of brief wake episodes occurring within the main sleep period, including micro-awakenings that can fragment sleep predictions across individual algorithms. Overall, these findings show that grid-search can replace manual parameter tuning with a more explicit and reproducible procedure while providing small improvements in sleep timing estimation and benefiting ensemble-based handling of within-sleep wakefulness.

20

Heart rate persistence index (HRPI): a threshold-free wearable metric for sustained HR elevation

Zhang, R.

2026-04-07 physiology 10.64898/2026.04.07.716987 medRxiv

Top 0.8%

0.9%

Show abstract

Wearable devices generate dense longitudinal heart rate (HR) data, but summarizing sustained heart rate elevation in a single daily metric remains challenging. We developed the heart rate persistence index (HRPI), defined as the largest integer k such that at least k minutes in a day have HR [≥] k bpm. For example, an HRPI of 105 means daily HR was [≥]105 bpm for at least 105 minutes. HRPI is threshold-free and integrates magnitude and duration of elevated HR into a single interpretable value. Using multi-day wearable recordings from a PhysioNet dataset, we show that HRPI captures structure beyond mean HR, reflects variability-related features, and exhibits robust day-to-day stability. In an independent healthy cohort, HRPI declines strongly with age, supporting physiological relevance. HRPI offers a compact, interpretable, and robust summary of sustained HR elevation for longitudinal wearable studies, providing information easily accessible to both specialists and nonspecialists.